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Severe acute respiratory syndrome (SARS) coronavirus infection and growth are dependent on initiating 
signaling and enzyme actions upon viral entry into the host cell. Proteins packaged during virus assembly may 
subsequently form the first line of attack and host manipulation upon infection. A complete characterization 
of virion components is therefore important to understanding the dynamics of early stages of infection. Mass 
spectrometry and kinase profiling techniques identified nearly 200 incorporated host and viral proteins. We 
used published interaction data to identify hubs of connectivity with potential significance for virion formation. 
Surprisingly, the hub with the most potential connections was not the viral M protein but the nonstructural 
protein 3 (nsp3), which is one of the novel virion components identified by mass spectrometry. Based on new 
experimental data and a bioinformatics analysis across the Coronaviridae, we propose a higher-resolution 
functional domain architecture for nsp3 that determines the interaction capacity of this protein. Using 
recombinant protein domains expressed in Escherichia coli, we identified two additional RNA-binding domains 
of nsp3. One of these domains is located within the previously described SARS-unique domain, and there is a 
nucleic acid chaperone-like domain located immediately downstream of the papain-like proteinase domain. We 
also identified a novel cysteine-coordinated metal ion-binding domain. Analyses of interdomain interactions 
and provisional functional annotation of the remaining, so-far-uncharacterized domains are presented. Over¬ 
all, the ensemble of data surveyed here paint a more complete picture of nsp3 as a conserved component of the 
viral protein processing machinery, which is intimately associated with viral RNA in its role as a virion 
component. 


The severe acute respiratory syndrome coronavirus (SARS- 
CoV) is an enveloped virus with a 29.7-kb positive-strand RNA 
genome (35). Replication of this genome and transcription are 
mediated by a large membrane-anchored RNA processing 
complex. Components of this complex are derived from the 16 
nonstructural proteins (nspl to nspl 6 ) that are processed from 
the open reading frame la (ORFla) and ORFlb. The polypro¬ 
tein la (ppla) is translated from ORFla, while the polyprotein 
lab (pplab) is formed by a —1 ribosomal frameshift upstream 
of the ORFla stop codon, causing read-through into ORFlb. 
SARS-CoV encodes two proteinases, a “main proteinase” 
(nsp5) and a papain-like proteinase (PL2 pro domain of nsp3). 
These two proteins proteolytically cleave ppla and pplab into 
the 16 mature nsp’s (61). Specifically, SARS-CoV PL2 pro 
cleaves ppla at the three sites 177 LNGG I AVT lg3 , gl5 LKGG 
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I API g21 , and 2737LKGG 1 KIV 2743 to release nspl, nsp2, 
and nsp3, respectively. 

In current coronavirus terminology, the term “nonstructural 
protein” usually refers to peptides processed from ppla and 
pplab, while “structural protein” refers to the N, M, S, and E 
proteins, which interact to coordinate the structure of the 
virion lipidic envelope (39). The term “accessory protein” 
refers to group- or subgroup-specific proteins, some of which 
may be incorporated in virions. A typical virion may contain 
the viral RNA genome, plus tens to hundreds of copies of N, 
M, and S proteins; a few E proteins (16); and an unknown but 
presumably small quantity of accessory proteins such as the 
SARS-CoV ORF3a (22), ORF6 (21), ORF7a (20), and 
ORF7b (51) proteins. Incorporation of the accessory ORF9b 
protein can be inferred from incorporation of the homologous 
I protein of murine hepatitis virus (MHV) (13). Furthermore, 
our recent electron cryomicroscopy (cryo-EM) analysis of 
coronavirus ultrastructure (39) revealed that the viral ribonu- 
cleoprotein is sufficiently loosely packed in the virion core to 
leave ample space for possible additional incorporation of host 
proteins (56). 

Flere we used mass spectrometry proteomics and protein 
kinase profiling techniques to probe the contents of purified 
SARS-CoV virions. We investigated cellular pathways involved 
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TABLE 1. Recombinant expression of nsp3 domains 


Construct 

Boundary positions 

Vector 

Tag 

E. coli strain 

Reference 

UBl 

1-112 

pET25b 

Tagless 

BL21(DE3)RIL 

53 

UB1-AC 

1-183 

pET25b 

Tagless 

BL21(DE3)RIL 

53 

ADRP 

184-365 

pMHIF 

Elis tag 

DL41 

47 

SUD 

389-726 

pET28aTEV 

Cleavable His tag 

Rosetta pLysS 


SUD45 i_ 65 i 

451-651 

pET28aTEV 

Cleavable His tag 

Rosetta pLysS 


SUD-C 

513-651 

pET28b 

Cleavable His tag 

BL21(DE3) 

7 

UB2-PL2 pro 

723-1037 

pETlla 

Tagless 

BL21(DE3) 

45 

NAB 

1066-1225 

pMHIF 

His tag 

DL41 



in coronavirus assembly, and we expected our experimental 
approach to identify novel vims component-host protein inter¬ 
actions important to virogenesis. We attempted to bias the 
analysis toward identification of biologically significant host 
proteins by subtracting proteins purified from uninfected cells, 
proteins identified with only one sample preparation method, 
and proteins occurring only on the proteolytically sensitive 
surface of the virion during the analysis. One hundred seventy- 
two host proteins and eight viral proteins meeting these criteria 
are described here, including three nsp’s. Network analysis (2) 
based on previously reported biochemical interaction mapping 
(65) revealed several hubs of connectivity (we use the term 
“hub of connectivity” or “hub” to refer to molecular species 
showing an outstandingly large number of intermolecular in¬ 
teractions) among incorporated components of viral origin. 
Among the hubs with the most connections are the viral M 
protein, the RNA genome, and nsp3. The M protein links the 
other major virion components at the site of budding (35), and 
an integral role for the RNA genome in assembly had been 
anticipated (15). nsp3, however, which is the protein capable of 
making the most connections to other virus-encoded compo¬ 
nents of the virion, had not previously been implicated in 
coronavirus assembly. We therefore selected nsp3 for further 
functional and structural characterization. 

SARS-CoV nsp3 is a large multidomain protein that in¬ 
cludes confirmed proteinase and poly(ADP-ribose) binding 
domains. We present here an updated nsp3 phylogeny and 
domain map, including novel validated metal ion-binding and 
nucleic acid-binding domains. We also describe the use of 
relative conservation data to infer functional information for 
the remaining uncharacterized nsp3 domains. We interpret 
these data in light of recent functional and structural charac¬ 
terizations of nsp3 domains (45, 47, 53), which leads us to 
suggest an important role for nsp3 in coronavirus RNA syn¬ 
thesis and virogenesis. 

MATERIALS AND METHODS 

SARS-CoV growth, purification, and treatment. SARS-CoV Tor2 was cultured 
in Vero-E6 cells, which are derived from the African green, or vervet, monkey 
Cercopithecus aethiops. Vero-E6 cells were selected for high viral growth rate and 
reproducibility of infection. Cells were inoculated at a high multiplicity (~1 to 3 
PFU/cell), medium was exchanged after 24 h, and high-titer infectious superna¬ 
tant was collected 48 h after inoculation. Viral supernatants were clarified by 
centrifugation at 12,000 X g for 30 min, collected by precipitation with 8% 
polyethylene glycol 8000 and 2% NaCl, and banded at 140,000 X g for 1.5 h on 
discontinuous five-step 10% to 50% sucrose gradients. Purified native virus was 
collected by side puncture and pelleted through HEPES-buffered 0.9% saline 
(pH 7.0). At this point, aliquots representing virus purified from about 1 liter of 
infectious supernatant were treated with 5,000 U DNase I (New England Bio¬ 
labs) for 1 h at 37°C in the supplied DNase I buffer to remove any adherent host 


chromatin and associated proteins, followed by 60 mg proteinase K (New En¬ 
gland Biolabs) for 1 h at 37°C. Proteinase K treatments were not performed in 
the presence of a detergent in order to preserve the integrity of the viral mem¬ 
brane. Proteinase K was then removed by pelleting virus through a 30% sucrose 
cushion. Native and enzymatically treated virus preparations were lysed and 
inactivated with 1% Triton X-100 (for kinase assays), followed by boiling for 5 
min (for mass spectrometry). The concentration of detergent was reduced by 
pelleting denatured protein aggregates through HEPES-buffered saline. 

Infectious SARS-CoV in this study was purified by density gradient band¬ 
ing. Banded viruses are expected to be more pure than viruses purified by 
pelleting through a discontinuous 10 to 30% sucrose cushion, as was done in 
our previous cryo-EM study of SARS-CoV supramolecular architecture (39). 
Analysis of a representative portion of that set of cryo-EM images containing 
1,018 enveloped particles from pelleted SARS-CoV revealed 42 particles not 
visibly recognizable as SARS-CoV (4% of the total) and eight apparently 
empty vesicles (1%), which are not expected to contribute a significant 
amount of protein to the mass spectrometry analysis. The purity of the 
SARS-CoV used for mass spectrometry and kinase analysis was therefore 
estimated to be greater than or equal to 95%. 

Protein construct design, cloning, expression, and purification. SARS-CoV 
nsp3 (GenBank accession number NP_828862) extends from nucleotides 2719 to 
8484, corresponding to residues Ala907 to Gly2828 of ppla. A summary of 
selected nsp3 expression constructs and conditions is shown in Table 1. Expres¬ 
sion of several nsp3 domains has been described previously (3, 47, 53). The 
UB2-PL2 pro expression construct was a kind gift from Andrew Mesecar (Uni¬ 
versity of Chicago—Illinois). All other constructs were amplified by PCR from 
genomic cDNA of the SARS-CoV Tor-2 strain. Amplification primers were 
designed to produce the constructs listed in Table 1. Amplicons were cloned into 
the expression vectors pMHIF (N-terminal His 6 Thio 6 tag; derivative of pBAD 
from Invitrogen), pET25b (tagless construct), pET28b (thrombin-cleavable N- 
terminal His 6 tag), or pET28aTEV (tobacco etch virus protease-cleavable N- 
terminal His 6 tag). 

For expression of all constructs in Table 1 except SUD-C and UB1, a se¬ 
quence-verified clone was transformed into Escherichia coli, and an overnight 
culture from a fresh transformant was used to inoculate flasks of LB medium 
containing antibiotic. Cultures were grown at 37°C with vigorous shaking to an 
optical density at 600 nm of 0.6 to 0.8, induced as needed, and grown at 14°C 
overnight. Bacteria were harvested by centrifugation and lysed by sonication in a 
buffer containing 50 mM potassium phosphate, pH 7.8, 300 mM NaCl, 10% 
glycerol, 5 mM imidazole, 0.5 mg/ml lysozyme, 100 pd/liter benzonase, and 
EDTA-free protease inhibitor (Roche; one tablet per 50 ml buffer). The lysate 
was clarified by ultracentrifugation at 45,000 rpm for 20 min at 4°C, and the 
soluble fraction was applied onto a metal chelate column (Talon resin charged 
with cobalt; Clontech). The column was washed with a solution containing 20 
mM Tris, pH 7.8, 300 mM NaCl, 10% glycerol, and 5 mM imidazole and eluted 
in buffer containing 25 mM Tris, pH 7.8, 300 mM NaCl, and 150 mM imidazole. 
The eluate was then purified by anion exchange on a Poros HQ column using a 
linear gradient of NaCl (0 to 1 M) in 25 mM Tris-HCl, pH 8.0. Tobacco etch virus 
protease was added to proteins with cleavable tags in a 1:50 molar ratio. After 
incubation overnight at 4°C, the cleaved tags and uncleaved proteins were cap¬ 
tured by a Talon resin column, and the flowthrough was concentrated and further 
purified by size-exclusion chromatography on a Superdex 75 column equilibrated 
with 10 mM Tris, pH 7.8, 150 mM NaCl. Pure fractions were concentrated and 
either used immediately for assays or flash-frozen in liquid nitrogen. SUD-C was 
produced as described in reference 7. UB1 was produced as described in refer¬ 
ence 53. 

Metal ion-binding assay. Purified proteins were not actively stripped of metal 
ions before analysis; rather, proteins were selected that did not measurably strip 
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CoCl 2 from the Talon affinity matrix at the time of purification. Ten-micromolar 
SUD-C, SUD 451 _ 651 , or full-length SUD solutions were mixed with CoCl 2 to final 
concentrations ranging from 0 to 50 |xM Co(II) in a buffer containing 25 mM Tris 
at pH 7.8 and 300 mM NaCl. Samples were incubated on ice for 30 min, and 
absorption spectra from 250 to 800 nm were then recorded on a Cary UV-Vis 
spectrophotometer. Matched baseline spectra from samples containing only 
buffer and CoCl 2 were subtracted from the absorption spectra of the protein- 
containing samples. Zn(II) titration was performed by recording optical spectra 
after addition of ZnCl 2 following incubation with 50 |xM CoCl 2 . 

Extraction of viral proteins and digestion. Native SARS-CoV, enzymatically 
treated SARS-CoV, and host background protein samples were divided into two 
identical parts, one used for trichloroacetic acid (TCA) precipitation and the 
other for methanol delipidation. For TCA precipitation, TCA was added to the 
sample to a final content of 25% (vol/vol). The sample was then placed on ice for 
30 min and centrifuged at 13,000 X g for 5 min. The pellet was twice washed with 
cold acetone to ready it for the next step. For methanol delipidation, 2.5 volumes 
of methanol, 0.25 volume of chloroform, and 0.5 volume of water were added. 
The sample was then centrifuged at 16,000 X g for 2 min, and the organic layer 
was removed. After back extraction with 3 volumes of methanol, the sample was 
centrifuged at 16,000 X g for 2 min to obtain a pellet. The resulting pellets from 
the two extraction conditions were separately solubilized in Invitrosol (Invitro- 
gen, Carlsbad, CA), sonicated for 30 min, and reduced with tris(2-carboxy- 
ethyl)phosphine, and the cysteines were alkylated with iodoacetamide. Acetoni¬ 
trile was then added to a final content of 80% (vol/vol). Finally, the sample was 
digested with trypsin (enzyme/substrate ratio of 1:50 [wt/wt]) at 37°C overnight. 

Mass spectrometry analysis of viral proteins. The protein digest from each 
sample was analyzed by Multidimensional Protein Identification Technology 
(MudPIT) (69). Briefly, digested proteins were pressure loaded onto a fused 
silica capillary column packed with a 3-cm, 5-|xm Partisphere strong cation 
exchanger (SCX; Whatman, Clifton, NJ) and 3-cm, 5-pm Aqua C 18 material 
(RP; Phenomenex, Ventura, CA), with a 2-|xm filter union (UpChurch Scientific, 
Oak Harbor, WA) attached to the SCX end. The column was washed with buffer 
containing 94.9% water, 5% acetonitrile, and 0.1% formic acid. After desalting, 
a 100-(jLm-inside-diameter capillary with a 5-p.m pulled tip packed with 10-cm, 
3-|xm Aqua C 18 material was attached to the filter union, and the entire split 
column was placed in line with an Agilent 1100 quaternary high-pressure liquid 
chromatograph (Agilent, Palo Alto, CA) and analyzed using a modified 11-step 
separation ( 66 ). Three buffer solutions were used: 5% acetonitrile-0.1% formic 
acid (buffer A), 80% acetonitrile-0.1% formic acid (buffer B), and 500 mM 
ammonium acetate-5 % acetonitrile-0.1% formic acid (buffer C). The first step 
consisted of a 100-min gradient from 0 to 100% buffer B. Steps 2 to 10 had the 
following profile: 3 min of 100% buffer A, 5 min of X% buffer C, a 10-min 
gradient from 0 to 15% buffer B, and a 97-min gradient from 15 to 45% buffer 
B. The 5-min buffer C percentages (X) were 5,10,15, 20, 25, 30,40,55, and 75%, 
respectively. In the final step, the gradient contained 3 min of 100% buffer A, 20 
min of 100% buffer C, a 10-min gradient from 0 to 15% buffer B, and a 107-min 
gradient from 15 to 100% buffer B. As peptides were eluted from the micro¬ 
capillary column, they were electrosprayed directly into an LTQ linear ion trap 
mass spectrometer (ThermoFinnigan, San Jose, CA) with the application of a 
distal 2.4-kV spray voltage. A cycle of one full-scan mass spectrum (400 to 1,400 
m/z) followed by five data-dependent tandem mass spectrometry (MS/MS) spec¬ 
tra at a 35% normalized collision energy was repeated continuously throughout 
each step of the multidimensional separation. 

Processing of mass spectra. MS/MS spectra were analyzed using the following 
software analysis protocol. Poor-quality spectra were removed from the data set 
using an automated spectrum quality assessment algorithm (4). MS/MS spectra 
remaining after filtering were searched with the SEQUEST algorithm (12) 
against a combined human, SARS-CoV, and vervet monkey database from NCBI 
that was concatenated to a decoy database in which the sequence for each entry 
in the original database was reversed. SEQUEST results were assembled and 
filtered using the DTASelect program (60) with a peptide false-positive rate of 
5%. To increase the probability of identifying viral proteins while simultaneously 
maintaining reasonably high filtering criteria, proteins with one peptide hit were 
accepted, but we required all peptides identified to be fully tryptic. 

Bioinformatics analysis. An initial multiple sequence alignment was produced 
using NCBI BLAST (1) to identify homologous regions and then Clustal to align 
the homologous regions ( 8 ). The initial alignment was manually fine tuned to 
reflect (in hierarchical order) solved coronavirus protein structures, conserved 
cysteine and histidine residues, TMHMM2 transmembrane region prediction 
(30), and structure/loop context from PredictProtein analysis (46). Annotations 
and region boundaries displayed here were derived from published analysis by 
Gorbalenya et al. (18), de novo SARS-CoV-specific domain structure-prediction 
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(24), and a combination of domain expression and nuclear magnetic resonance 
screening for foldedness. 

The following sequences were used for nsp3 alignments in Fig. 2A and S2 in 
the supplemental material: group la, HCoV-NL63 (YP_003766), HCoV- 
229E (NP_073549), PEDV (AAK38661), BtCoV 512/2005 (ABG47077); group lb, 
transmissible gastroenteritis virus (TGEV) (NP_058422), PRCoV (ABG89316), FCoV 
(YP_239353); group Ha, HCoV-HKUl-A (YP_173236), HC 0 V-HKUI-N 6 
(ABD75567), MHV-JHM (AAA46457), MHV-A59 (NP_068668), BCoV 
(NP_150073), HCoV-OC43 (NP_937947), HEV (YP_459949); group lib, SARS- 
CoV (AAP41036), BtCoV-HKU3 (AAY88865), BtCoV-Rfl (ABD75321); 
group lie, BtCoV-HKU5 (ABN10892), BtCoV 133/2005 (YP_729202); group 
lid, BtCoV-HKU9-l (YP_001039970), BtCoV-HKU9-2 (ABN10918), BtCoV- 
HKU9-3 (ABN10926), BtCoV-HKU9-4 (ABN10934); group III, IBV-Beaudette 
(NP_066134), IBV-Peafowl/GD/KQ6/2003 (AAT70073), IBV-LX4 (AAQ21583), 
IBV-BJ (AAP92673); torovirus group (aligned from ADP-ribose-l"-phosphatase 
[ADRP] onward), EToV (ABC26008), BToV (YP_337905). The alignment pre¬ 
sented in Fig. 2C and analysis in Fig. 8 include HCoV-229E, HCoV-NL63, 
BtCoV 512/2005, FCoV, HCoV-HKUl, MHV-A59, HCoV-OC43, SARS-CoV 
Tor2, BtCoV 133/2005, BtCoV-HKU5, BtCoV-HKU9-l, BtCoV-HKU9-4, IBV- 
Beaudette, and IBV-Peafowl/GD/KQ6/2003 sequences listed under or linked 
from the accession numbers above. 

Kinase array analysis. A full PepChip protein kinase substrate usage profiling 
assay (Pepscan Systems, Lelystad, Netherlands) was performed according to the 
manufacturer’s instructions. Briefly, purified native SARS-CoV was lysed by 
trituration in a protease inhibitor cocktail containing 1% Triton X-100. SARS- 
CoV lysate was applied to duplicate peptide substrate arrays in the presence of 
[ 7 - 33 P]ATP. The labeled substrate array was visualized by autoradiography, 
digitally scanned, and quantified using ImageJ densitometry software (NIH). 
Duplicate PepChip arrays incorporated a total of 48 nonsubstrate peptides, 
which were used as negative controls to determine the background levels in the 
densitometry analysis. Density values for these spots were used to assess and 
filter results. Peptides for which both replicate spots exceeded the mean density 
value plus 1 standard deviation on the controls on the scanned autoradiograph 
were taken as positive results. 

Protein stoichiometry analysis. A detailed description and validation of per- 
fluoro-octanoic acid (PFO)-polyacrylamide gel electrophoresis (PAGE) as a tool 
for protein stoichiometry assessment can be found elsewhere (44). Briefly, pu¬ 
rified protein samples were incubated at 37°C for 1 h; mixed 1:1 with PFO 
loading buffer containing 8 % (wt/vol) PFO, 100 mM Tris base, 20% (vol/vol) 
glycerol, and 0.05% (wt/vol) orange G; and loaded onto precast 4 to 20% 
Tris-glycine gradient gels. Gel electrophoresis was performed with a standard 
Tris-glycine running buffer to which 0.5% (wt/vol) PFO was added. Protein was 
detected by SYPRO-ruby poststain (Invitrogen). 

Electrophoretic mobility shift and unwinding assays. For electrophoretic mo¬ 
bility shift assay (EMSA), protein samples were mixed with 0.8 |xg of RNA or 
DNA substrate and assay buffer containing 150 mM NaCl-50 mM Tris at pH 8.0 
to a total reaction volume of 20 pi. Sequence-matched RNA and DNA oligomers 
were designed (substituting T for U as appropriate) with randomized sequences 
designed to adopt single-stranded conformations: ssRNAl/ssDNAl, 5'-AAAU 
ACCUCUCAAAAAUAACACCACACCAUAUACCACAU-3', and ssRNA2/ 
ssDNA2, 5'-AGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCA 
GUC-3'. Double-stranded RNA (dsRNA) and DNA (dsDNA) were produced 
by boiling and slowly cooling equimolar mixtures of single-stranded RNA 
(ssRNA) or DNA (ssDNA) (substituting T for U) oligomers, 5'-GAAAGGAA 
AAAGGGAGAAGA-3' and 5'-UCUUCUCCCUUUUUCCUUUC-3\ Pro¬ 
tein-nucleic acid mixtures were incubated at 37°C for 1 h and analyzed by native 
electrophoresis on precast 6 % acrylamide DNA retardation gels (Invitrogen). 
Nucleic acid was detected by SYBR-gold poststain (Invitrogen) and photo¬ 
graphed using a UV light source equipped with a digital camera. SYBR-gold was 
rinsed out and protein was subsequently detected by SYPRO-ruby poststain 
(Invitrogen). Densitometry analysis was performed using a flatbed scanner with 
ImageJ software (NIH). The mobility shift of RNA at each protein concentration 
was calculated relative to the maximum shift observed in each experiment. K d 
(dissociation constant) values were measured from the midpoints of the fitted 
titration data. 

For unwinding assays, nucleic acid and protein mixtures were prepared and 
incubated as described above for the EMSAs. Instead of applying the samples 
immediately to polyacrylamide gels, samples were incubated at 4°C overnight to 
allow protein-nucleic acid complexes to dissociate before native PAGE analysis. 
Results were visualized and recorded as for EMSA. 
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TABLE 2. Background host proteins excluded from this analysis 


No. of times found (of two 
possible times) in sample type: 

No. of peptides 4 * 

% 

Coverage 6 

Description^ 

Abbreviation 5 

Background" 

Nat 6 

PK C 

Unique 

Total 



2 

2 

2 

5 

9 

22.90 

Actin beta ( Cercopithecus aethiops) 

ACTB 

1 

1 

0 

3 

4 

9.90 

Actin kappa (Homo sapiens ) 

FKSG30 

1 

2 

0 

2 

2 

6.10 

Adenylyl cyclase-associated protein 1 (Homo sapiens) 

CAP1 

1 

1 

0 

2 

2 

1.20 

Agrin (Homo sapiens) 

AGRN 

2 

1 

0 

5 

7 

4.40 

Alpha 1 type VI collagen (Homo sapiens) 

COL6A1 

1 

1 

0 

2 

2 

1.00 

Alpha 1 type VII collagen (Homo sapiens) 

COL7A1 

1 

1 

0 

5 

5 

4.10 

Alpha 1 type XII collagen (Homo sapiens) 

COL12A1 

2 

1 

0 

5 

11 

2.90 

Alpha-2-macroglobulin precursor (Homo sapiens) 

A2M 

1 

1 

0 

9 

16 

24.70 

Amylase (Homo sapiens) 

AMY 

2 

1 

0 

5 

5 

1.50 

Apolipoprotein B precursor (Homo sapiens) 

APOB 

2 

1 

0 

5 

5 

27.70 

Apolipoprotein E precursor (Cercopithecus aethiops) 

APOE 

2 

1 

0 

3 

6 

1.20 

Chondroitin sulfate proteoglycan 2 (Homo sapiens) 

VC AN 

2 

2 

0 

4 

11 

3.60 

Complement component 3 precursor (Homo sapiens) 

C3 

2 

1 

0 

7 

18 

2.60 

Complement component 4A/4B (Homo sapiens) 

C4A/C4B 

1 

0 

0 

2 

3 

0.80 

Complement component 5 (Homo sapiens) 

C5 

1 

2 

0 

2 

2 

7.10 

Enolase 1 (Homo sapiens) 

ENOl 

1 

1 

0 

2 

2 

0.90 

Fibrillin 1 (Homo sapiens) 

FBN1 

2 

2 

2 

33 

107 

18.20 

Fibronectin 1 (Cercopithecus aethiops) 

FN1 

1 

1 

0 

3 

3 

3.30 

Fibulin 1 isoform C ( Cercopithecus aethiops) 

FBLN1C 

1 

1 

0 

3 

7 

3.10 

Fibulin 1 isoform D (Homo sapiens) 

FBLN1D 

2 

2 

2 

8 

11 

4.90 

Filamin 1 (Homo sapiens) 

FLNB 

1 

1 

0 

3 

3 

6.80 

Galectin 3 binding protein (Homo sapiens) 

LGALS3BP 

2 

2 

0 

7 

11 

8.20 

Gelsolin (Homo sapiens) 

GSN 

1 

2 

1 

2 

2 

5.10 

Glyceraldehyde-3-phosphate dehydrogenase, 
spermatogenic (Homo sapiens) 

GAPDHS 

1 

2 

1 

2 

2 

5.50 

Heat shock 70-kDa protein 8 (Homo sapiens) 

HSPA8 

1 

2 

0 

2 

2 

4.50 

Heat shock 70-kDa protein 1 or 6 (Homo sapiens) 

HSPA1 or HSPA6 

1 

2 

2 

6 

29 

34.50 

Hemoglobin alpha 2 subunit (Homo sapiens) 

HBA2 

1 

2 

1 

2 

12 

6.80 

Hemoglobin beta subunit (Homo sapiens) 

HBB 

2 

2 

2 

15 

21 

4.00 

Heparan sulfate proteoglycan 2 (Homo sapiens) 

HSPG2 

2 

2 

0 

3 

5 

3.90 

Inter-alpha-globulin inhibitor H2 polypeptide (Homo 
sapiens) 

ITIH2 

2 

1 

0 

4 

4 

2.60 

Laminin, alpha 4 precursor (Homo sapiens) 

LAMA4 

2 

1 

0 

3 

3 

1.90 

Laminin, beta 1 precursor (Homo sapiens) 

LAMB1 

2 

1 

0 

4 

5 

3.60 

Laminin, gamma 1 precursor (Homo sapiens) 

LAMC1 

2 

1 

0 

2 

2 

3.10 

Latent transforming growth factor beta binding protein 

3 (Homo sapiens) 

LTBP3 

1 

1 

0 

2 

3 

2.10 

Latent transforming growth factor beta binding protein 

4 (Homo sapiens) 

LTBP4 

1 

2 

2 

5 

6 

3.70 

Myosin, heavy polypeptide 9, nonmuscle (Homo sapiens) 

MYH9 

1 

1 

0 

2 

2 

5.60 

Neuronal pentraxin I precursor (Homo sapiens) 

NPTX1 

1 

1 

0 

2 

2 

2.00 

Nidogen (enactin) (Homo sapiens) 

NIDI 

1 

1 

0 

2 

2 

3.50 

Olfactory receptor 5, H2 (Homo sapiens) 

OR5H2 

1 

2 

1 

2 

2 

5.70 

Plasminogen activator inhibitor 1 (Cercopithecus 
aethiops) 

SERPINE1 

1 

1 

0 

3 

7 

2.30 

Pregnancy-zone protein (Homo sapiens) 

PZP 

1 

2 

2 

2 

2 

5.30 

Pyruvate kinase 3 (Homo sapiens) 

PKM2 

1 

1 

0 

2 

2 

3.20 

Quiescin Q6 (Homo sapiens) 

QSOX1 

2 

1 

0 

5 

5 

1.80 

Reelin (Homo sapiens) 

RELN 

1 

2 

0 

2 

3 

6.50 

5-Adenosylhomocysteine hydrolase (Homo sapiens) 

AHCY 

1 

1 

0 

3 

16 

9.30 

Serine (or cysteine) proteinase inhibitor F member 1 
(Homo sapiens) 

SERPINF1 

1 

1 

0 

2 

3 

6.20 

Stem cell growth factor precursor (Homo sapiens) 

CLEClla 

2 

2 

0 

3 

5 

2.00 

Talin 1 (Homo sapiens) 

TLN1 

2 

1 

0 

23 

58 

16.40 

Thrombospondin 1 (Homo sapiens) 

THBS1 

1 

2 

2 

3 

4 

11.80 

Tubulin alpha (Homo sapiens) 

TUBA 

1 

0 

2 

2 

6 

10.90 

Ubiquitin B precursor (Homo sapiens) 

UBB 

2 

1 

0 

6 

13 

10.90 

VGF nerve growth factor inducible (Homo sapiens) 

VGF 


a The term “background” refers to proteins that we were unable to specifically exclude as being copurified with virus. 
b Nat, native SARS-CoV samples. 
c PK, DNase I-proteinase K-treated SARS-CoV. 

d Number of total and nonoverlapping (unique) peptides identified for each protein from the sample yielding the highest percent coverage. 

6 “Coverage” here refers to the percentage of the intact protein length accounted for by unique contributions of the fragments detected by mass spectrometry. 
f In cases of unambiguous identifications of Cercopithecus aethiops proteins and cases in which C. aethiops attribution could not be ruled out, the C. aethiops sequence is noted. Proteins 
identified solely from homology to H. sapiens homologs are listed as H. sapiens. Protein isoforms are noted only where explicitly identified. 

8 We have identified proteins here with standard abbreviations for the corresponding human genes from the NCBI Entrez Gene database (http:/Avww.ncbi.nlm.nih.gov/sites/entrez). 
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TABLE 3. Host proteins identified in purified SARS-CoV grouped by function" 
Function Proteins 


Proteins related to vesicular trafficking or viral budding. 

Cytoplasmic and shuttling RNA-binding proteins. 

Unfolded protein response. 

Cytoplasmic proteins and proteins of undetermined localization 


ER-resident proteins. 

Membrane-associated proteins 
Mitochondrial proteins. 

Nuclear proteins. 


Ribosomal subunits and translational cofactors 


.CHC1, SNX6, a-COP, (3-COP, -y-COP, ARF4, CDC42 
.hnRNP-Al, hnRNP-A2/Bl, hnRNP-A3, hnRNP-C, hnRNP-Hl, 

hnRNP-H2, hnRNP-K, hnRNP-L, hnRNP-M, hnRNP-R, hnRNP-U, 
LRPPRC, PABPC1 or -3, PABPC4, ROD1, PCBP1, PCBP2, PPTB1 
.CCT complex (three subunits), HSP90 (three subunits), HSPB1, 
HSPD2, PPIA, VCP 

.26S proteasome (six subunits), SMIC, ACLY, ADK, AKR1 (two 
subunits), ASS1, CBR1, CLIC1, CLIC4, CSNK2, EPRS, FASN, 
GARS, GART, GDI1 or -2, GNB2L1, GSTP1, LDHA, LGALS1, 
NARS, NT5C2, PAFAH1B, PGAM1 or PGAM2, PTGES3, RRM1, 
UCH-L1, CALR, CSE1L, KPNB1, NPM1, RAN, STATla, YWHA, 
14-3-3, STOM, STOML2, 2'-PDE, ATIC, BLVRB, CTPS1, 

CYB5R3, GNB1, HEATR2, HKDC1, PPP1C, PPP2 complex (one 
subunit), PRDX1, PRDX6, USP14 
.CANX, NSF 

.Clorf57, SLC25A6, ACSL4, ESD, PHB2, TAGLN2 
DLST, EF-Tu, LONP1, MDH2, MTHFD1, PCK2, F,F 0 ATP synthase, 
HADHB, MTCH2, VDAC1, VDAC2, VDAC3 
PRKD, NAP1L1, NME2, SAE1, SFPQ, H2A, H2B, H3, H4, RRP12, 
MATR3, IGF2BP1, DDX3, DDX5, DDX9, DDX21, DDX39, 
NONO, PRPF8 

,L3, L4, L5, L6, L7, L7a, L9, LlOa, L14, L17, L18, L18a, L23, L24, L27, 
L27a, L30, P0, S2, S3, S3a, S5, S6, S7, S8, S9, Sll, S12, S13, S16, 
S17, S20, S23, SA, eEFl, eEFl, eEFl, eEF2, eEF3, eEF4, eEF5a, 
eIF4a-l, eIF4a-3, GCN1L1 


" For a more detailed listing, see Table SI in the supplemental material. Proteins listed here were identified in at least one native and one PK SARS-CoV preparation 
and were not identified in background samples. We have identified proteins here with standard abbreviations for the corresponding human genes from the NCBI Entrez 
Gene database (http://www.ncbi.nlm.nih.gov/sites/entrez). 


RESULTS 

Protein purification and background analysis. We investi¬ 
gated the protein composition of SARS-CoV released from 
Vero-E6 cells during the peak growth period from 24 h to 48 h 
after inoculation. The protein fraction of clarified cell culture 
supernatant was collected by polyethylene glycol precipitation, 
and virus particles were purified by banding on a sucrose den¬ 
sity gradient. Virus purified in this way is referred to here as 
“native” SARS-CoV. Purified virus subjected to surface cleans¬ 
ing with DNase I followed by proteinase K is referred to as 
“PK” virus. 

We also attempted a proteomics analysis of Junin-Candidl 
arenavirus at the same time as SARS-CoV. However, due to 
the slow growth of Candidl in our hands, the resulting samples 
were essentially virus free but contained numerous high-mo¬ 
lecular-weight proteins associated with the cytoskeleton (21% 
of the proteins identified) and extracellular matrix (60% of the 
proteins identified [Table 2]). Trace sequences totaling 3.4% 
of the full-length Candidl nucleoprotein were identified, but 
these samples were otherwise free of viral proteins. Nucleo¬ 
protein is the most plentiful component of purified arenavirus 
(58), but characteristic virion components such as the Candidl 
SSP, GP-C, Z, and L proteins and host ribosomes (41) were 
conspicuously absent from these preparations. These samples 
were used to approximate the spectrum of proteins purified 
from uninfected Vero-E6 cells and are referred to here as 
“background” samples. 

Background samples also contained several proteins previ¬ 
ously identified as components of other enveloped viruses, for 
example, actin, myosin, and fibronectin (28). Enzymatic treat¬ 
ment in PK samples appeared to eliminate most background 


proteins, but a few cytostructural proteins including actin, 
myosin, filamin, tubulin, and fibronectin were consistently 
found in PK samples, indicating probable incorporation into 
the virion. Ubiquitin appeared to be enriched following PK 
treatment and therefore also likely represents a genuine virion 
component. 

Proteomics of SARS-CoV. To determine the protein compo¬ 
sition of the purified native, PK, and background samples, we 
performed two-dimensional liquid chromatography MS/MS 
analysis of peptide mixture generated by in-solution digestion 
of the proteins. Two primary extraction techniques were em¬ 
ployed: TCA precipitation and methanol delipidation. Pep¬ 
tides extracted by TCA and methanol delipidation were ana¬ 
lyzed separately, and the results were combined. Some 
proteins were identified using only one extraction technique, 
while others were identified with both. Except where explicitly 
stated, proteins reported here met three criteria: (i) presence 
in at least one PK sample, (ii) presence in one native sample, 
and (iii) absence from both background samples. SARS-CoV 
grows relatively poorly in most human cell types, and so the 
virus was grown in Vero-E6 cells derived from the African 
green monkey Cercopithecus aethiops. Because of the limited 
number of Cercopithecus aethiops protein sequences available, 
peptides were screened against a database including Cerco¬ 
pithecus aethiops and Homo sapiens sequences in addition to all 
SARS-CoV protein sequences of at least 9 amino acids. Using 
this procedure, eight viral proteins and 172 host proteins were 
identified from SARS-CoV, including the three explicit Cerco¬ 
pithecus aethiops sequences cyclophilin A (PPIA), calreticulin 
(CALR), and STAT-la (overview in Table 3; see also detailed 
descriptions in Table SI in the supplemental material). Be- 
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TABLE 4. Viral proteins identified in purified SARS-CoV a 


Protein 

No. of times found 
(of two possible 
times) in sample 
type: 

Native 6 Digested 47 

No. of 
peptides 4 * 

Total Unique 

% 

Coverage 47 

Length 
(aa Y 

N (ORF9) 

2 

2 

4 

4 

18.0 

423 

M (ORF5) 

2 

2 

6 

2 

19.4 

222 

S (ORF2) 

1 

2 

16 

16 

21.6 

1,256 

nsp3 

1 

2 

16 

14 

12.5 

1,922 

nsp5 

1 

2 

2 

2 

14.1 

306 

nsp2 

1 

1 

4 

3 

11.0 

639 

9b (ORF9b) 

1 

1 

2 

1 

22.2 

99 

3a (ORF3a) 

1 

1 

2 

1 

6.2 

275 

nsp4 s 

0 

2 

5 

3 

15.2 

500 

nsp9^ 

0 

2 

2 

2 

23.9 

113 


a Proteins are ranked in relative confidence order as a surrogate measurement 
for relative copy number according to the following criteria: number of times 
detected in native samples > PK samples > the product of percent coverage and 
protein length. 
b Purified, native virions. 

c Purified, DNase I-treated, proteinase K-treated, repurified virions. 
d Number of total and nonoverlapping (unique) peptides identified for each 
protein. 

e Coverage refers to the percentage of the intact protein length accounted 
from unique contributions of the fragments detected by mass spectrometry. 
^Length, in amino acids, of each protein or proteolytically processed nsp. 

8 nsp4 and nsp9 were not detected in native samples and thus did not meet the 
full validation criteria of this study. 


cause of the large number of proteins identified, most of the 
host proteins listed in Table 3 and Table SI in the supplemen¬ 
tal material are presented without regarding their potential 
relevance to the viral replication cycle. 

Specificity of incorporated protein kinases. Coronavirus nu- 
cleoproteins are phosphorylated by host protein kinases, in¬ 
cluding cyclin-dependent kinase, glycogen synthase kinase, mi¬ 
togen-activated protein kinase (MAPK), and casein kinase II 
(CSNK2) (59). Nucleoprotein phosphorylation has been pro¬ 
posed as the mechanism leading to incorporation of host pro¬ 
tein kinases in coronavirus particles, as has been demonstrated 
for MHV (56). The two host protein kinases identified here by 
mass spectrometry (CSNK2 and DNA-activated protein kinase 
[PRKD]) function in host signaling cascades and are therefore 
of potential importance to SARS-CoV pathogenesis. A func¬ 
tion-based screening method was used to further investigate 
the presence of protein kinases identified by mass spectrome¬ 
try in that native SARS-CoV lysates were used to radiolabel a 
microarray containing 1,152 peptides with known phosphory¬ 
lation sites (see Fig. SI in the supplemental material). 

Substrates that were phosphorylated by at least 1 standard 
deviation above background levels in each of two replicate 
arrays are reported here. Of 77 phosphorylated substrates, 29 
could be linked with a specific protein kinase. Three kinase 
activity signatures were detected multiple times in the virion 
lysate, i.e., CSNK2 (four substrates), protein kinase A (PRKA; 
12 substrates), and protein kinase C (PRKC; five substrates). 
Other kinase signatures represented by a single phosphory¬ 
lated substrate included CAMK2, CKS1, CSK, epidermal 
growth factor receptor, GRK1, MAPK1, PHK, and RPS6K. Of 
these, CSNK2 was detected in both PK and native virion ly¬ 
sates and thus represents a probable virion component. Ribo- 
somal protein S6 kinase (RPS6K) was found in both PK sam¬ 


ples and is probably incorporated, as we conclude from the 
generally heavy ribosomal protein representation in SARS- 
CoV as well as the specific presence of the RPS6 substrate in 
the sample. PRKA, PRKC, and MAPK1 were absent in PK 
samples, and each was detected in only one native sample (data 
not shown), and therefore we concluded that they were present 
through adventitious copurification or entanglement at the 
virion surface. One protein kinase detected by mass spectrom¬ 
etry, PRKD, was not detected by substrate phosphorylation, 
possibly due to the presence of only three validated PRKD 
substrates on the chip. 

Relative abundance of viral proteins. Protein detection by 
mass spectrometry proteomics depends on factors including 
abundance, sensitivity of detection, enzymatic pretreatment, 
extraction method, proteinase accessibility, and availability of 
potential proteolytic cleavage products of appropriate molec¬ 
ular weight. Mass spectrometry is therefore not an optimal tool 
for precise measurement of the absolute stoichiometry of in¬ 
corporated components but can provide a general idea of 
ranked abundance within a sample. We used a hierarchy of 
native detection frequency > PK detection frequency > pep¬ 
tide coverage relative to protein length for a tentative ranking 
of the relative abundance of viral and host proteins in SARS- 
CoV (Table 4). SARS-CoV N, M, and S were consistently 
among the 10 most abundant proteins detected in PK samples. 
The accessory SARS-CoV ORF3a and ORF9b proteins and 
nsp2, nsp3, and nsp5 were present in lower relative abundance 
and were of equal or lesser abundance in PK samples than 
were some ribosomal proteins, histones, heat shock protein 90, 
and phosphatase I (Table 4; also see Table SI in the supple¬ 
mental material). 

Characterization of novel SARS-CoV virion proteins. Mass 
spectrometry proteomics revealed the SARS N, M, S, ORF3a, 
and ORF9b proteins, as well as three components of ppla, i.e., 
nsp2, nsp3 and nsp5, as viral components (Table 4). Two ad¬ 
ditional replicase components, i.e., nsp4 and nsp9, were en¬ 
riched in PK virus but were not detected in native SARS-CoV 
preparations. An interaction network was created incorporat¬ 
ing biochemical interaction data (see reference 65 and refer¬ 
ences therein) and protein-RNA interaction data (11, 38, 53, 
54) to illustrate the network of interactions related to virion 
assembly (Fig. 1). All viral proteins identified in this study or 
known from other, previously published work can be linked 



FIG. 1. Interaction map for SARS-CoV-derived components. Dou¬ 
ble outlines indicate major components, including known high-copy- 
number virion proteins and the large viral RNA genome, and minor 
components, including low-copy-number and weakly conserved pro¬ 
teins. Black outlines identify components detected by mass spectrom¬ 
etry proteomics. Gray outlines indicate components identified in other 
published studies. Solid single outlines denote novel components iden¬ 
tified in both native and PK SARS-CoV. 
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FIG. 2. Overview of nsp3 organization. (A) Multiple sequence alignment of coronavirus and torovirus nsp3 homologs. The 16-component 
functional annotation presented here (Func) is an extension of our previous SARS-CoV-specific domain boundary prediction (SARS) and the 
ongoing analysis by Gorbalenya and collaborators (Gorb). It incorporates domain boundaries defined in a hierarchy of functional (f), structural 
(s), and phylogenetic (p) criteria. The functional annotation was compiled from published data and results presented here. Region designations 
include the following: ubiquitin-related domains (UB1 and UB2), an acidic hypervariable region (AC), complete (PLl pro and PL2 pro ) or partial 
( pro ) papain-like cysteine proteinases, ADRP, a SARS-CoV subgroup-specific MBD, the carboxyl-terminal moiety of the “SARS-unique domain” 
(SUD-C), group Il-specific NAB domain and marker domain (G2M), two predicted double-pass transmembrane domains (TM1-2 and TM3-4), 
a putative metal-binding region (ZF), and three subdomains forming part of the Y region (Y1 to Y3) originally described by Gorbalenya et al. (18). 
Dotted lines denote additional subgroup-specific domains not included in the annotation above. Amino acid residues are color coded gray 
(AFGILMPVWY), light blue (KNQRST), blue (CH), or red (DE) to highlight patterns that may mark conserved protein structures. We divide 
group II into four subgroups following published suggestions (71) and divide group I into two subgroups. Sequences from equine and bovine 
toroviruses are shown from the domain homologous to ADRP onward. (B) Selected SARS-CoV expression constructs. Solid lines denote 
expression (also Table 1); dashed lines indicate that no expression has so far been obtained. (C) Enlargement of the ZF and flanking regions, with 
transmembrane domain predictions. The overlay shows the average transmembrane probability score for 400-amino-acid regions centered on the 
first conserved cysteine of ZF. A red overlay displays average transmembrane probability scores calculated by TMHMM2 for this region from a 
set of 15 representative coronaviruses, approximately equally weighted with respect to each subgroup (see Materials and Methods). For display 
purposes, in this panel the sequences are aligned only with conserved clusters of four cysteine/histidine residues in ZF and Y1 (a and (3). 
(D) Structural annotation of SARS-CoV nsp3. Experimentally characterized flexibly disordered regions are indicated with dashed green lines, and 
predicted flexible regions separating conserved domains are indicated with solid green lines. 


directly or indirectly to the four major virion components (de¬ 
fined here as major components with respect to copy number 
and relative molecular weight), i.e., N, M, S, and the genomic 
RNA. We were unable to detect the small, hydrophobic E 
protein in SARS-CoV lysates by mass spectrometry. 

Proteins with many interacting partners identified among 
virion-incorporated proteins included the M protein, which 
coordinates S, E, N, and possibly RNA incorporation into the 
virion, and nsp3, which is a novel virion component. Only less 
than half of the SARS-CoV nsp3 protein has been character¬ 
ized to date. The characterized regions include a poly(ADP- 
ribosej-binding ADRP, a papain-like proteinase and deubiq- 
uitinase (PL2 pro ), and two domains with ubiquitin-like folds 
(UB1 and UB2). Therefore, we selected SARS-CoV nsp3 for 
further characterization. 


Phylogenetic analysis of nsp3. The most frequently encoun¬ 
tered protein globular domains are formed from contiguous 
polypeptide chain segments of about 100 amino acid residues 
(68). Previous bioinformatics analyses of nsp3 had identified 
only a few domains fitting this criterion, but they predicted 
several large regions likely to include multiple structural do¬ 
mains. We therefore compiled a higher-resolution analysis of 
nsp3 domain architecture as a tool for novel structural and 
functional characterization. We performed a phylogenetic 
analysis of nsp3 (Fig. 2; see also Fig. S2 in the supplemental 
material) to identify small, conserved regions that might yield 
expressible protein domains. Protein sequence analysis of 
coronavirus and torovirus nsp3 homologs revealed a pattern of 
alternating conserved and nonconserved regions, consistent 
with a multiple-domain and linker structure (Fig. 2A; see also 
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FIG. 3. Oligomerization of SARS-CoV nsp3 domains. (A) PFO-PAGE analysis reveals the oligomeric state of selected nsp3 domains in 
solution. A Benchmark protein ladder (M) was used to estimate protein and protein complex molecular masses, indicated in kDa at left. Lanes 
in panel A contain, from left to right, 25 p-M, 50 pM, and 100 pM nsp2, ADRP, and SUD; 25 pM and 50 pM UB2-PL2 pro ; and 50 pM and 100 
pM NAB, respectively. (B) Reducing sodium dodecyl sulfate-PAGE analysis of selected nsp3 domains. Lanes in panel B contain, from left to right, 
50 pM and 100 pM nsp2, NAB, SUD, ADRP, and UB1, respectively. 


Fig. S2 in the supplemental material). Results from previously 
published studies (17, 61, 73) and fold recognition software 
(24) were incorporated in this process of construct design. 
Previous studies showing that the UB1, PL2 pro , and ADRP 
domains of nsp3 were both well folded and functional when 
expressed separately were taken as support of the domain and 
linker structure of nsp3 (45, 47, 53). 

As shown in Fig. 2B, predicted domains located toward the 
amino terminus of nsp3 were tested and found to be generally 
amenable to expression as domains, while all but one region 
downstream of the PL2 pro domain was not efficiently ex¬ 
pressed. One possible reason for the expression difficulties may 
lie in the presence of a long hydrophobic domain predicted to 
contain four transmembrane spans in this region (Fig. 2C). 
Based on the expression pattern and the available structural 
data, a general model of nsp3 structure was proposed (Fig. 
2D). In modeling nsp3, we were guided by the assumption that 
nsp3 topology would be constant among coronaviruses. The 
proposed structure contains four transmembrane spans and 
places nearly all of nsp3, including the PL2 pro domain, on one 
face of the membrane. The domain topology of the model of 
membrane-embedded nsp3 is inferred from the presence of 
PL2 pro cleavage sites at both termini of nsp3 and bioinformatic 
predictions. While the exact number of transmembrane spans 
is not certain, any multiple of two could be conducive to post- 
translational processing of nsp3 by PL2 pro and would present 
the bulk of nsp3 on the same membrane face occupied by nsp5 
3CL pro and the pplb replicase proteins. Our model of TM 
distribution (Fig. 2C) follows the 3TM + 1TM distribution of 
transmembrane regions recently proposed for MHV nsp3 (26), 
which was based in part on observed glycosylation patterns 
from truncated nsp3 constructs (19, 26) and is consistent with 
an independent model of nsp4 structure (40). The interpreta¬ 
tion presented in Fig. 2C includes all three major phylogenetic 
groups and the newly sequenced group II bat coronaviruses. 
Although we note that phylogenetic evidence more consis¬ 
tently suggests a 2TM + 2TM distribution across the corona- 
virus family (Fig. 2C), the weight of biochemical evidence 
currently favors the 3TM + 1TM distribution. 


Several types of domain designation may be possible for a 
given set of input sequences, depending on the criteria used for 
selection. Here we present a working functional annotation 
based on a hierarchy of functional > structural > phylogeny- 
based domain identification. Where protein function and 
structure are known, “functional” domains such as ADRP and 
PL2 pro have been noted. Where only the structure is known, as 
for ubiquitin-related UB2, “structural” domains are noted. 
Where only the primary sequence data were available, islands 
of sequence conservation, termed “phylogenetic” domains 
such as Y1 to Y3, were designated. Our analysis revealed 16 
conserved nsp3 domains—identified here as UB1, AC, PLl pro , 
ADRP, MBD (metal-binding domain), SUD-C, UB2, PL2 pro , 
NAB, G2M, TM1-2, ZF, TM3-4, Yl, Y2, and Y3—of which 
between 12 and 15 domain homologs could be identified in any 
one coronavirus (Fig. 2A). Tryptic peptide fragments of nsp3 
identified by mass spectrometry were derived from the ADRP 
(four peptides), MBD (one peptide), SUD-C (three peptides), 
PL2 pro (two peptides), Yl (two peptides), and Y2 (three pep¬ 
tides) domains. The multidomain construct SUD, with residues 
389 to 726, encompasses the newly annotated MBD and 
SUD-C domains. 

Stoichiometry of nsp3. PFO is a nondissociative detergent 
that can be used with native PAGE to determine the mass of 
protein complexes (44). We investigated the oligomeric struc¬ 
ture of purified nsp3 domains using PFO-PAGE. The ex¬ 
pressed domains and multidomain constructs of nsp3 tested 
here (Fig. 3) and previously (45, 53) appeared to migrate 
mainly as monomeric species, with trace amounts of dimers 
visible, while lysozyme and protein molecular weight markers 
migrated as monomers, as previously reported (44). In con¬ 
trast, full-length nsp2 was primarily monomeric, with a small 
concentration of trimeric species and traces of dimeric, tet- 
rameric, and higher-molecular-weight species (compare Fig. 
3A and 3B), confirming that monomer > dimer oligomeriza¬ 
tion is characteristic of nsp3 domains. 

PFO-PAGE analysis of mixed nsp3 domains revealed the 
presence of high-molecular-weight species consistent with the 
size of IX UB1+SUD, 2X UB1+SUD, and IX ADRP+SUD 
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FIG. 4. PFO-PAGE analysis of interdomain oligomerization. Ap¬ 
proximately equimolar concentrations of bacterially expressed nsp3 
domains were incubated separately (left) or in combination (right) at 
37°C for 1 h and analyzed by PFO-PAGE. The panel at left demon¬ 
strates the electrophoretic mobility of each protein species and 
homooligomer; lanes at left contain 2 and 1 nanomole of UB1, ADRP, 
or SUD or 10 and 5 nanomoles NAB, respectively. Each lane at right 
depicts mixtures of 2 nanomoles of UB1, ADRP, or SUD and 5 
nanomoles NAB as shown. Proteins were visualized with SYPRO-ruby 
staining. Marked bands correspond to 50-kDa and 110-kDa 
UB1 + SUD complexes (filled triangles) and 60-kDa ADRP+SUD 
complexes (open triangles). In the presence of additional nsp3 do¬ 
mains, UB1+SUD complexes are not formed, but the amount of 
ADRP+SUD complex is increased. Duplicate samples are shown for 
the four-domain mixture. Lanes containing the Benchmark protein 
ladder are indicated (M), with masses in kilodaltons indicated at left. 


(Fig. 4). Both UB1-SUD complexes disappeared when addi¬ 
tional nsp3 domains were added prior to incubation, consistent 
with a weak ionic interaction between acidic UB1 and SUD, 
whereas the ADRP-SUD complex formation was enhanced in 
the presence of additional nsp3 domains. Homodimeric forms 
of nsp3 domains also persisted in the presence of additional 


nsp3 domains. As shown in the rightmost lanes of Fig. 4, 
SUD+ADRP and SUD+SUD complexes were present in 
the same sample. Products of the expected size for a 
SUD+SUD+ADRP complex were not observed, indicating 
that the SUD and ADRP binding sites on an SUD molecule 
either overlap or are mutually antagonistic. Although the in 
vitro data show that the individual nsp3 domains are predom¬ 
inantly monomeric, there is also support for the hypothesis 
that in vivo the macromolecular structure of nsp3 may be 
constrained by multiple intrachain and homotypic interchain 
interactions and that nsp3 may exist in multiple alternate mac¬ 
romolecular assemblies. 

Metal binding analysis. A relatively large number of MBDs 
have been discovered or predicted among coronavirus repli- 
case proteins. The recent structure of nsplO revealed two zinc 
fingers (25), nspl5 utilizes manganese as a cofactor (23), and 
both the nspl3 helicase and nspl4 exonuclease domains con¬ 
tain conserved clusters of cysteine and histidine residues that 
are characteristic of metal ion-binding domains. In addition to 
the validated MBD located within PL2 pro , at least three other 
conserved potential metal-binding motifs exist in the carboxyl- 
terminal region of nsp3 (Fig. 2C; ZF, Yla, and Yl(3). During 
some but not all purifications of bacterially expressed SUD, 
addition of protein caused a visible “bleaching” effect on the 
Talon affinity matrix which was interpreted to arise from cobalt 
stripping activity. 

To test for metal-binding activity by SUD, we added addi¬ 
tional CoCl 2 and ZnCl 2 to purified SUD, SUD 45I _ 65I , and 
SUD-C domains and examined the UV-visible spectra (Fig. 5). 
Zinc binding does not produce a detectable spectral change, 
but charge transfer between cobalt(II) and sulfur atoms (here, 
probably cysteine residues) produces a characteristic absorp¬ 
tion signal with peaks at —310 and 340 nm (5). UV-visible 
spectrum analysis indicated that full-length SUD (389 to 726; 
Fig. 5A to C) bound cobalt, whereas neither the truncated 
SUD (451 to 651; Fig. 5F) nor the carboxyl-terminal portion of 
this region (SUD-C 513 to 651; Fig. 5D and E) showed evi- 


A. B. 



(3 Co(ll) molar 



Wavelength (nm) 



FIG. 5. Titration of cobalt binding by 10 pM SUD and SUD-C. UV-visible spectra of 10 pM full-length SUD (A to C), SUD-C (D and E), and 
truncated SUD^^j (F) solutions were measured after addition of 0 to 5 molar equivalents of Co(II) in the form of CoCl 2 . Relative Co(II) 
concentration is indicated with colored lines running from red (0 equivalents) to violet (5 equivalents). Because of the observed metal ion 
concentration-dependent protein precipitation during these experiments, both the raw absorbance at 310 nm (A 310 ; panels B, C, E, and F; black 
circles) and normalized absorbance (A 310 A4 250 ; open circles) are plotted. (C) Displacement of Co(II) by Zn(II) was investigated by addition of 
ZnCl 2 to 10 pM SUD solutions that had been previously saturated with 5 equivalents of Co(II). 
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dence of cobalt binding. Addition of zinc(II) to cobalt-com- 
plexed SUD did not dampen the S-Co(II) spectral signal but 
appeared to induce additional protein precipitation, visualized 
as a general increase in the absorbance in the far-UV range, 
which was confirmed by visual inspection. The precipitation- 
corrected 310-nm absorbance curves (Fig. 5B, inset) are most 
consistent with binding of a single cobalt atom per SUD mol¬ 
ecule. Addition of zinc following cobalt saturation did not 
diminish the spectral signal at 310 nm, indicating that equimo¬ 
lar zinc is unable to displace bound cobalt bound to SUD. 
These data were interpreted to indicate that a cysteine-coor¬ 
dinated metal ion-binding site with a high affinity for cobalt is 
localized partly or wholly in the amino-terminal domain of 
SUD, which we therefore describe as the MBD. SUD contains 
six conserved cysteines (SARS-CoV nsp3 positions 393, 456, 
492, 507, 550, and 623) and two conserved histidine residues 
(positions 539 and 613), which could participate in a tetra¬ 
hedral metal ion coordination site. The lack of metal ion bind¬ 
ing by truncated SUD 451 _ 651 suggests that Cys393 may have a 
key role in metal ion coordination. 

Nucleic acid binding analysis. We previously reported that 
both the UB1 domain and the glutamic acid-rich acidic (AC) 
hypervariable domain, collectively known as nsp3a, consis¬ 
tently copurified with nucleic acid, implicating nsp3 as a nu¬ 
cleic acid-binding protein (53). EMSAs were performed to 
investigate whether nsp3 domains concealed further nucleic 
acid-binding sites. Two domains, the full-length SUD and the 
NAB domain, which immediately follows PL2 pro , exhibited 
nucleic acid-binding activity at micromolar concentrations 
(Fig. 6). SUD and NAB were therefore tentatively annotated 
as nucleic acid-binding domains pending further functional 
characterization. Relatively high micromolar concentrations of 
the SUD-C domain produced a reproducible but indistinct 
electrophoretic mobility shift in the presence of nucleic acids, 
which may be attributable to an electrostatic interaction me¬ 
diated by the net positive charge of SUD-C at neutral pFl. The 
lack of appreciable nucleic acid-binding affinity by SUD-C sug¬ 
gests that MBD may modulate nucleic acid binding by the 
full-length SUD. RNA binding, rather than DNA binding, is 
expected to be the native function of nsp3 domains as was 
previously suggested for SARS-CoV nsp9 (11). SUD and NAB 
showed an equivalent or slightly higher affinity for ssRNA than 
for dsRNA (Fig. 6B). Neither nsp2 nor ADRP showed appre¬ 
ciable binding to any of the generic RNA or DNA substrates 
tested (Fig. 6). 

Bacterially expressed NAB displayed similar ATP-indepen- 
dent dsDNA unwinding properties (Fig. 7), consistent with 
preferential single-stranded nucleic acid binding. We com¬ 
pared the activity of NAB with that of a previously described 
amino-terminal structured domain of the SARS-CoV N pro¬ 
tein (N-NTD [48]). Approximately 20-fold-less NAB was re¬ 
quired to generate the same level of unwinding activity on 
dsDNA as that observed for N-NTD on dsRNA (Fig. 7B). 
Single-strand binding and double-strand unwinding by NAB 
are consistent with a nucleic acid chaperone function, which 
has also been proposed for the coronavirus nucleoprotein (74). 

Annotation of uncharacterized domains. We were pre¬ 
vented from experimentally characterizing the function of all 
nsp3 domains, since seven domains were not expressed by E. 
coli : G2M, TM1-2, ZF, TM3-4, Yl, Y2, and Y3. We therefore 


used conservation-based statistics to qualitatively profile the 
function of the unexpressed domains. This method is based on 
two hypotheses: (i) the extent of protein conservation mirrors 
the relative importance in the virus replication cycle and (ii) 
conserved enzymatic activity should place more constraints on 
protein sequence divergence than nonenzymatic function or 
species-specific “accessory” enzymatic function. We calculated 
the maximum percent amino acid identity for 392 pairs of 
aligned protein or domain homologs belonging to different 
subgroups within the same group (i.e., group Ha versus group 
lib and group Ha versus group lie). Whole proteins were used 
for this analysis, except where multiple sequence alignments 
revealed the presence of multiple conserved domains sur¬ 
rounded by areas of very low conservation, as was observed for 
the amino- and carboxyl-terminal domains of the N protein. 
Proteins and domains from 13 representative coronaviruses 
were included in the comparison. As expected, enzymatic and 
nonenzymatic functions corresponded to significantly different 
levels of conservation (Mann-Whitney U test; P < 0.001; see 
Fig. 8). These results reflected the following rank order of 
conservation: enzymes > enzymatic domains > nonenzymes > 
nonenzymatic domains. These data demonstrate how a quali¬ 
tative functional assignment can be inferred from the degree of 
conservation for coronavirus proteins. 

We examined protein conservation for the aforementioned 
seven uncharacterized nsp3 domains (Fig. 8). Conservation 
analysis predicted nonenzymatic (or nonconserved enzymatic) 
function for the four domains G2M, TM1-2, ZF, and TM3-4. 
All three domains from the Y region (Yl to Y3) were approx¬ 
imately equally conserved and ranked between enzymes and 
enzymatic domains. From the consistently high conservation of 
Yl, Y2, and Y3, we hypothesize that Yl to Y3 may form a 
single functional unit with a conserved enzymatic function. 

DISCUSSION 

Limitations in the interpretation of mass spectrometry re¬ 
sults. The results presented here indicate that nsp3 and several 
other proteins of viral and host origin may be contained in puri¬ 
fied virions. Although we have not formally eliminated rare ves¬ 
icles of a buoyant density similar to that of SARS-CoV as a source 
for some of the noncanonical proteins detected in this study, we 
believe that this possibility is remote, based on our earlier obser¬ 
vations of the purity of SARS-CoV preparations (see above and 
reference 39) and considering the fact that other replicase pro¬ 
teins such as the polymerase and helicase were not observed in 
this study. The biological significance of nsp3 packaging and the 
implications for other coronaviruses remain to be determined. 

DNase and proteinase K treatments were performed to dif¬ 
ferentiate between proteins entwined or embedded at the 
virion surface and internal proteins. Data presented in the 
“PK” column of Table 2 demonstrate that the enzymatic treat¬ 
ment followed by an additional density gradient purification 
step did reduce detection of most “background” extracellular 
matrix proteins below the threshold of detection. However, 
proteinase K treatment did not completely eliminate all viral 
surface proteins. The spike protein ectodomain was detected 
after enzymatic treatment of the virions, possibly because of 
the persistence of a proteinase-resistant core. Thus, we were 
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FIG. 6. Generic nucleic acid binding properties of SUD-C, SUD, and NAB domains of nsp3. (A) EMSAs were performed with sequence- 
matched 20-nucleotide dsDNA or dsRNA or one of two functionally equivalent sets of sequence-matched 40-nucleotide ssDNA or ssRNA 
oligomers. Gels were stained for protein or nucleic acid as indicated. Lanes containing protein only at the highest listed concentration (P), 800 ng 
of nucleic acid only (N), dsDNA ladder marker (M), and mixtures of protein with 800 ng nucleic acid are indicated. Protein concentration decreases 
in twofold increments from left to right within the indicated range. Maximum protein concentrations used here were determined empirically by 
expression and stability in solution at 4°C. Electrophoretic mobility ranges for nucleic acids (black brackets), protein (small triangles), and 
protein-nucleic acid complexes (white brackets) are indicated on the right. SUD-C has a small net positive charge at neutral pH and migrated 
through the gel only in complex with nucleic acid (NA). Results from two single-stranded nucleic acid sequences that behaved equivalently in 
non-sequence-specific EMSA are shown. (B) Binding curves were constructed from densitometry data calibrated to the maximum and minimum 
binding in each gel. The range in which increasing nucleic acid binding was observed is indicated with a bold line above each graph to facilitate 
comparison. SUD binding curves may overestimate affinity since maximal binding overlapped with the limit of protein solubility. 


unable to rule out either possible topology for nsp3 in the viral 
membrane based on proteolytic cleavage by proteinase K. 

Origin and quantification of proteins detected in our anal¬ 
ysis. The diversity of proteinase K-resistant host proteins that 
we found to be associated with purified virions may best be 
explained as a manifestation of the internal state of the in¬ 
fected host cells at the time of peak viral release. Between 24 


and 48 h postinoculation the infected Vero-E6 cells became 
rounded and detached, with a granular appearance character¬ 
istic of late-stage infection. The extensive collection of histones 
observed in SARS-CoV virions following sequential treatment 
with DNase I and proteinase K argues against entwined host 
chromatin on the virion surface as the source of these proteins, 
just as the absence of the most common, high-copy-number 
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FIG. 7. Duplex unwinding activity of NAB and comparison with 
SARS-CoV nucleoprotein amino-terminal structured domain (N- 
NTD). Samples of NAB (A) or N-NTD (B) protein and single- 
stranded or duplex nucleic acid were mixed and incubated as for 
EMSA and then chilled overnight to allow the protein-nucleic acid 
complexes to dissociate before analysis by native PAGE. Lanes con¬ 
taining the highest concentration of protein only (P), nucleic acid only 
(N), and dsDNA marker (M) are indicated. Double-stranded (filled 
triangles) and single-stranded (open triangles) nucleic acids were de¬ 
tected with SYBR-gold dye, which stains double-stranded nucleic acid 
more prominently than single-stranded nucleic acid. Protein concen¬ 
tration decreases in twofold increments within the range shown. En¬ 
largements showing the dose-dependent nucleic acid unwinding activ¬ 
ity are included at the bottom of each panel. 


nuclear proteins argues against copurification of intact nuclei. 
Packaged shreds of degraded chromatin from apoptotic cells 
would seem to be a more likely source, and it has been esti¬ 
mated that 95% of the Vero-E6 cells are apoptotic within 48 h 
of infection (37). This could also explain the presence of the 
mitochondrial and ribosomal proteins observed. Analysis of 
background proteins pelleted from the supernatant of Vero-E6 
cells over the same time period did not reveal any histone, 
ribosome, nuclear, or mitochondrial proteins. When discussing 
these observations, we need to keep in mind that the presently 
used collection procedure was designed to maximize virus yield 
and therefore protein detection. Analysis of virus collected 



FIG. 8. Use of amino acid conservation to infer function for exper¬ 
imentally uncharacterized nsp3 domains. Average percent identity 
(API) was measured by pairwise alignment of conserved proteins and 
domains from different subgroups (la versus lb, Ha versus lib, etc.) or 
groups (I versus III, etc.). Conserved coronavirus proteins are grouped 
by functional class, including enzymes (P-E; nsp5, nspl2, nspl3, nspl4, 
nspl5, and nspl6), nonenzymatic proteins (P-NE; M and E), enzy¬ 
matic domains (D-E; ADRP, PLl pro , and PL2 pro ), and putative non¬ 
enzymatic domains (D-NE; UB1, AC, SUD-C, UB2, NAB, and two 
nucleoprotein domains). Dotted lines mark intersubgroup API values 
associated with domains not found in all groups (PLP1, SUD-C, NAB, 
and G2M). Subgroup-specific markers such as SARS-CoV MBD were 
not included. Uncharacterized nsp3 domains clustering with enzymatic 
(UD-E; Yl, Y2, and Y3) and nonenzymatic (UD-NE; TM1-2, ZF, 
TM3-4, and G2M) classes are indicated. 


before the onset of apoptosis might reveal a somewhat differ¬ 
ent protein profile and will be the focus of a future study. 

Proteins previously reported to interact with incorporated 
coronavirus proteins and genomic RNA were well represented 
in the proteomics results, including two proteins reported to 
bind N protein, i.e., cyclophilin A and 14-3-3 (tyrosine 3-mono- 
oxygenase/tryptophan 5-monooxygenase activation protein) 
(34, 59). We did not detect UBE2I, which is an E2 ubiquitin- 
conjugating enzyme reported to interact with the SARS-CoV 
nucleoprotein (32), but did identify three other points of con¬ 
tact with the ubiquitin and ubiquitin-like conjugation path¬ 
ways, i.e., the ubiquitin-specific proteinase 14 (USP14), the 
ubiquitin-carboxyl-terminal esterase LI (UCH-L1), and the 
SUMO-1 activating enzyme 1 (SAE1). Further research may 
determine whether the presence of these host proteins could 
be related to the presence of two ubiquitin-like domains (45, 
53) or to the ubiquitin-cleaving activity of nsp3 (33). Proteins 
previously reported to interact with the MHV genome includ¬ 
ing polypyrimidine tract binding protein (PPTB1), cytoplasmic 
polyadenosine binding proteins (PABP4 and PABP1/3), 
and heterogeneous nuclear ribonucleoproteins (hnRNPAl, 
hnRNPA2/Bl, and hnRNPA3) were among the numerous 
RNA-binding proteins detected (55). However, in this study we 
were unable to distinguish whether RNA-binding proteins 
were incorporated bound to viral RNA or host mRNAs or as 
soluble proteins. 

The relative abundance of proteins detected by mass spec¬ 
trometry can be approximated from the reproducibility of de¬ 
tection and from the variety of peptides found, with more- 
complete coverage expected for overrepresented proteins than 
for rare proteins. Our data are consistent with published re¬ 
ports that the SARS-CoV M and N proteins are highly abun- 
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dant in the virion, closely followed by the S protein (16). 
Proteins that were detected in fewer experiments and with 
lower coverage, such as ORF3a, ORF9b, nsp2, nsp3, and nsp5 
proteins, are therefore likely to be present in lower relative 
copy numbers. Each SARS-CoV preparation that was analyzed 
contained ~10 10 virions, making it possible that proteins 
present in single copies on only a small percentage of virions 
could still be identified. 

Connections to vesicular trafficking pathways. The host fac¬ 
tors involved in coronavirus budding remain largely unknown. 
Most enveloped viruses bud by coopting host proteins, often 
from the intracellular ESCRT transport pathways (reviewed in 
reference 67). Elements of the clathrin and COPI protein 
complexes were identified in PK SARS-CoV preparations. 
Clathrin coats assemble at the plasma membrane and the 
trans-Golgi network, which are quite distant from the endo¬ 
plasmic reticulum (ER)-Golgi intermediate compartment 
(ERGIC) where SARS-CoV budding occurs. Other compo¬ 
nents of assembled clathrin lattices, including clathrin light 
chain and adaptor proteins, were not detected, suggesting that 
free clathrin may have been captured from the cytoplasm at the 
time of budding. 

Three COPI components (a-COP, p-COP, and y-COP) and 
a protein involved in coatomer assembly (ARF4) were de¬ 
tected. The COPI coatomer plays a role in transport between 
the ER and the Golgi apparatus and in transport between 
Golgi stacks (27). COPI proteins are abundant at ERGIC 
membranes and have been shown to colocalize with budding 
MHV (29). A dibasic motif in the cytoplasmic tail of the 
SARS-CoV and MHV spike proteins of the type K(X)KXX 
was recently shown to bind COPI through an undetermined 
mechanism and is required for efficient interaction with the M 
protein (36). Coronavirus M proteins also possess a conserved 
dibasic motif in the cytoplasmic tail region, which might func¬ 
tion similarly (see Fig. S3 in the supplemental material). 

The ADP-ribosylation factor 4 (ARF4) is a small guanine 
nucleotide-binding protein involved in COPI trafficking. It has 
been shown that depletion of ARF4, but not of ARF1, ARF3, 
ARF5, or ARF6 (ARF2 having been lost in mammalian cells), 
induced tubulation at Golgi membranes (64). A similar phe¬ 
nomenon has been observed in MHV-infected cells and is 
linked to E protein expression (see reference 43 and references 
therein). ARF4-binding motifs are found in G-protein-coupled 
receptors such as rhodopsin, and these generally take the form 
of conserved NP(X„)Y motifs, where X„ typically denotes the 
presence of one to three intervening nonconserved residues 
(10). A similar XP(Xj)Y, XP(X 2 )Y, or XP(X 4 )Y motif can be 
found in most coronavirus E proteins (see Fig. S3 in the sup¬ 
plemental material). While we have not mapped the precise 
amino acid requirements for E-mediated budding in this study, 
two other mutagenesis and reversion studies have identified a 
region critical to the function of MHV (14) and TGEV (31) E 
proteins that maps to residues 47 to 65 in SARS-CoV E (see 
Fig. S3 in the supplemental material). The SARS-CoV E pro¬ 
tein XP(X 2 )Y or XP(X 4 )Y motif, KPTVYVY, is found be¬ 
tween residues 53 and 59. We note that the proline and the 
C-terminal tyrosine of this motif appear to be highly conserved 
among coronaviruses. 

Anticipated proteins that were not detected in this study. 

Some peptides were likely missed in our study because of low 
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solubility, poor proteinase accessibility, or unreported se¬ 
quence differences from the Homo sapiens homologs. As ex¬ 
pected for the presently used technique, hydrophobic trans¬ 
membrane regions are underrepresented, even among the 
proteins that were otherwise unambiguously detected. Thus, 
no peptides were recovered from the transmembrane regions 
of the S, M, ORF3a, nsp3, and nsp4 proteins. This may also 
explain the failure to detect hydrophobic low-copy-number 
virion components such as the E, ORF6, and ORF7b proteins. 
Proteinase K treatment may have eliminated the detectable 
regions of some type I integral membrane proteins with very 
small cytoplasmic tail regions such as the ORF7a protein, 
which was detected in native but not PK samples (data not 
shown). We therefore suspect that proteins from ORF2 to 
ORF9b may have been present in at least limited quantity in 
purified SARS-CoV preparations and that ORF3b, E, ORF6, 
ORF7a, ORF7b, ORF8a, and ORF8b proteins were not de¬ 
tected here due to technical limitations and the biochemical 
properties of these proteins. 

Although we detected the ARF5 and COPI proteins, which 
are abundant at membranes of the ERGIC (29, 64), which is 
the site of SARS-CoV assembly (57), we did not detect any of 
the other ERGIC components identified in a recent proteomics 
survey (6). Overall, integral membrane proteins and mem¬ 
brane-associated proteins are underrepresented in the results 
of our analysis. While we were unable to exclude the possibility 
that low membrane protein detection was due to purely tech¬ 
nical reasons, such as low solubility, limited protease accessi¬ 
bility, or paucity of trypsin-cleavable fragments of appropriate 
molecular weight, our results would appear to corroborate a 
previous observation that M protein networks can exclude host 
proteins that are present at the site of assembly from the viral 
membrane (9). 

Novel viral proteins identified. We identified three new in¬ 
corporated SARS-CoV proteins (nsp2, nsp3, and nsp5) in PK 
virus samples. We were also able to confirm that the ORF9b 
protein is incorporated, as was suspected from previously pub¬ 
lished results for the MHV I protein (13). We are unable to 
determine from the present results whether nsp2 and nsp3 
were incorporated as a polyprotein. We were also unable to 
exclude the possibility that the nsp’s were associated with other 
membrane-bound structures that copurified with virus. How¬ 
ever, the best available evidence suggests that the viral repli- 
case proteins, with the exception of nspl, colocalize at the site 
of replication (42, 63), and no viral structure has yet been 
described which is specifically enriched in nsp2, nsp3, and nsp5. 
Therefore, we interpret these results to indicate that the nsp’s 
identified by mass spectrometry proteomics analysis were in¬ 
corporated in virions. 

Two viral proteinases, nsp3 and nsp5, were detected in the 
virion. Finding the 1,922-amino-acid, multiple-membrane- 
spanning nsp3 in the virion was especially unexpected. nsp3 is 
best known for the presence of the highly conserved second 
papain-like cysteine proteinase (PL2 pro ) and ADRP, which 
together comprise about one-third of the mass of nsp3. But 
how was it incorporated in the virion? The lack of evidence for 
incorporated SARS-CoV polymerase, helicase, and nuclease 
proteins would appear to rule out efficient copurification of 
replicase complexes or double-membraned replicase vesicles 
as a source of nsp3. At the time at which this work was started. 
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no protein-protein interactions involving nsp3 had been re¬ 
ported, except for PL2 pro -mediated cleavage of polyubiquitin 
substrates (33). The relative abundance of nsp3 (as estimated 
by the frequency of detection and the percent coverage) ap¬ 
peared to be greater than that of nsp2, ORF3a, ORF6, ORF7a, 
and ORF9b proteins, all of which are reported to interact with 
nsp3 (65). This observation suggests that viral protein-protein 
interaction is probably not the primary mechanism of nsp3 
incorporation. 

We recently reported that the UB1 domain of nsp3 binds a 
discrete ssRNA species and that the adjacent AC domain binds 
bacterial dsDNA (53). Here we now report that two additional 
domains of SARS-CoV nsp3, i.e., SUD and NAB, exhibit dis¬ 
tinct nucleic acid-binding characteristics. MBD incorporates a 
metal ion-binding site which may mediate RNA binding, and 
NAB exhibits energy-independent double-stranded nucleic 
acid unwinding properties, which would be consistent with 
nucleic acid chaperone activity. The presence of conserved 
cysteine/histidine clusters between the putative transmem¬ 
brane domains (ZF) and at the amino terminus of the Y 
domain (Yl) may signal the presence of additional MBDs, 
which could increase the total number of nucleic acid-binding 
domains in nsp3 to six, i.e., UB1, AC, SUD, NAB, ZF, and Yl. 
RNA-binding proteins were also abundant among the detected 
host proteins and may have been packaged with genomic RNA 
or with incorporated host mRNAs. However, we also note that 
several other putative or confirmed SARS-CoV RNA-binding 
proteins were not detected in this study, perhaps suggesting 
that nsp3 has a specialized role in virogenesis, as was previ¬ 
ously suggested for the PL pro -containing nspl of equine arteri¬ 
tis virus (62). 

The mechanism of putative nsp3 incorporation in purified 
SARS-CoV preparations remains unclear. Despite the pres¬ 
ence of a host-derived envelope on each virion, host integral 
membrane and membrane-associated proteins comprised only 
1% and 7%, respectively, of the detected proteins. A much 
higher percentage of viral integral and membrane-associated 
proteins was detected, including M, S, nsp3, and the ORF3a 
and ORF9b proteins. Therefore, while nucleic acid binding 
properties may have contributed to the disproportionate de¬ 
tection of nsp3 in purified virions relative to the adjacent pro¬ 
teolytic products of ppla, the transmembrane region of nsp3 
may also have contributed to incorporation. 

The recent structural characterization of twin ubiquitin-re- 
lated domains near the amino terminus of nsp3 (53) and the 
present demonstration of the presence of additional RNA- 
binding domains in nsp3 have implications for reconstructing 
the path of nidovirus replicase evolution. The Nidovirales en¬ 
compass the coronaviruses, toroviruses, arteriviruses, and roni- 
viruses. Replicase polyproteins from these viruses share con¬ 
served domains and common transcriptional mechanisms. As 
shown in Fig. 2A, coronaviruses from groups I, II, and III 
contain an initial UB1 homolog, followed by either a functional 
(group I and Ha) or a vestigial (group III) PLl pro domain 
lacking the catalytic histidine found in functional PLl pro and 
PL2 pro . We interpret the existence of paired UB and PL pro 
domains as favoring an evolutionary model in which a proto- 
typic PL pro gene was duplicated in the last common ancestor of 
coronaviruses and subsequent loss of PLl pro occurred in some 
coronavirus lineages (73). Two possible mechanisms for the 


duplication of the UB and PL pro domains are duplication of a 
gene cassette by direct repeat, as observed on a smaller scale in 
the AC domain of HCoV-HKUl nsp3 from various isolates 
(Fig. 2) (70), and a recombination event between viruses with 
distinct UB and PL pro domains prior to the divergence of the 
known coronavirus lineages. 

We identified the novel nucleic acid binding domains SUD 
and NAB, which are located downstream of UB and PL pro 
homologs. SUD and NAB share no detectable sequence ho¬ 
mology. This does not necessarily preclude a structural rela¬ 
tionship, since sequence-based criteria did not predict the 
structural homology between the two ubiquitin-related do¬ 
mains of nsp3, i.e., UB1 and UB2. Further investigation will be 
required to determine whether SUD and NAB domains are 
the result of duplication of a putative UB-PL pro -nucleic acid 
binding protein gene cassette or became embedded in nsp3 
independently. 

Functional implications. There is growing evidence that the 
function of nsp3 is closely tied to association with nucleic acids. 
We would hypothesize that the functions of the UB1, AC, 
SUD, NAB, ADRP, and PL2 pro domains could be coordinated 
on a complex of protein and single-stranded and double- 
stranded RNA, such as the viral replicative form RNA (re¬ 
cently reviewed in reference 50). The character of coronavirus 
RNA replicase activity changes from an early, unstable form 
associated with discontinuous negative-strand synthesis to a 
later form associated with positive-strand synthesis (49). These 
observations suggest that PL pro -mediated cleavage of the coro¬ 
navirus polyprotein or other substrates such as polyubiquitin or 
poly(ADP-ribose), may drive the shift from coronavirus nega¬ 
tive-sense to positive-sense RNA synthesis. A possible location 
for this activity would be the template-switching hot spots 
mapped to complementary sequences near the 5' genomic 
terminus and 3' antigenomic terminus (72). PL pro involvement 
in viral RNA synthesis has a precedent in arterivirus, which 
requires the multifunctional, multidomain papain-related pro¬ 
teinase nspl for subgenomic RNA transcription but not for 
replication (62). 

Although the work presented here represents primarily a 
starting point for detailed exploration of the overall function of 
coronavirus nsp3, we already note similarities to eukaryotic 
poly(ADP-ribose) polymerase (PARP) enzymes. Activated 
PARP consumes NAD + to synthesize a polymer of ADP- 
ribose that is covalently linked to a target protein (reviewed in 
reference 52). The best-characterized member of the family is 
PARP-1, which initiates the repair of nicked DNA through two 
N-terminal nucleic acid binding zinc fingers and is auto-poly- 
(ADP-ribosyl)ated on a glutamic acid-rich domain. PARPs can 
contain multiple adaptor domains preceding a conserved C- 
terminal catalytic domain, which in some cases includes one or 
more H2A macrodomains homologous to the ADRP of nsp3, 
and also nucleic acid binding domains. The ADRP domain of 
SARS-CoV nsp3 has been shown to strongly bind poly(ADP- 
ribose), and analysis of the nsp3 Y region yielded several Fold 
and Function Assignment System (FFAS) (24) hits on viral 
RNA-dependent RNA polymerase and both prokaryotic and 
eukaryotic DNA-dependent RNA polymerase domains (based 
on FFAS confidence scores of —9.5 and lower; data not 
shown). If nsp3 did indeed contain a functional PARP domain, 
it could obviously function in proofreading, genome repair, or 
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nidovirus-specific discontinuous subgenomic RNA transcrip¬ 
tion. 
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